1 

Improved Source Coding Exponents 
via Witsenhausen's Rate 

Benjamin G. Kelly and Aaron B. Wagner 
School of Electrical and Computer Engineering 
Cornell University 
Ithaca, NY 14853 
bgk6 @ Cornell .edu, wagner @ ece . Cornell. edu 

Abstract 

We provide a novel upper-bound on Witsenhausen's rate, the rate required in the zero-error analogue of the 
Slepian-Wolf problem; our bound is given in terms of a new information-theoretic functional defined on a certain 
graph. We then use the functional to give a single letter lower-bound on the error exponent for the Slepian-Wolf 
problem under the vanishing error probability criterion, where the decoder has full (i.e. unencoded) side information. 
Our exponent stems from our new encoding scheme which makes use of source distribution only through the 
positions of the zeros in the 'channel' matrix connecting the source with the side information, and in this sense 
is 'semi -universal'. We demonstrate that our error exponent can beat the 'expurgated' source-coding exponent of 
Csiszar and Korner, achievability of which requires the use of a non-universal maximum-likelihood decoder. An 
extension of our scheme to the lossy case (i.e. Wyner-Ziv) is given. For the case when the side information is a 
deterministic function of the source, the exponent of our improved scheme agrees with the sphere-packing bound 
exactly (thus determining the reliability function). An application of our functional to zero-error channel capacity 
is also given. 

I. Introduction 

Under consideration is the communication problem depicted in Figure [TJ nature produces a sequence 
(Xi, Yi) governed by the i.i.d. distribution Pxy on alphabet X x y. An encoder, observing the sequence 
X n , must send a message to a decoder, observing the sequence Y n (the side information), so that the 
decoder can use the message and its observation to generate X n , an estimation of X n to some desired 
fidelity. 

For lossless reproduction, using the criterion that P^ Y [X n ^ X n ) — > as the blocklength n — > oo, 
Slepian and Wolf [1J determined that all rates in excess of H(X\Y) are achievable. Bounds on the rate of 
decay of the error probability for this problem, the so-called error exponent, were determined by Csiszar 
and Korner J3 whose results include a universally attainable random coding exponent and a non-universal 
'expurgated' exponent. Previously Gallager [3] derived a non-universal exponent that was later shown to 
be universally attainable by Csiszar, Korner and Marton @|. For the Slepian-Wolf problem in its full 
generality (i.e. allowing for coded side information) the best known exponents are those of Csiszar 
and Oohama and Han [6J. In the regime where the rate of the second encoder is large, our new exponent 
also improves upon these results, but we do not consider the general case here. 

In the case of lossy reproduction, with the loss measured by some single letter distortion function d, the 
scenario is known as the Wyner-Ziv problem 0, after Wyner and Ziv who showed that if the allowable 
expected distortion is A, then the required rate is given by 

Rwz(Pxy, A) = inf I(X; U) - I(Y; U), 

where the infimum is over all auxiliary random variables U such that (1) U, X, and Y form a Markov 
chain in this order and (2) there exists a function <p such that 



E[d(X, <f>(Y, [/))] < A. 
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Fig. 1. Source coding with full side information 



The best available exponents for the Wyner-Ziv problem were determined by the present authors in [[8]|. 
Henceforth we refer to both lossless and lossy problems as full side information problems. 

We describe new encoding schemes for both full side information problems which rely on ideas 
from graph theory. Our analysis shows that the chromatic number of a particular graph can be used 
to characterize the number of sequences that can be communicated without error. We are able to give 
a single letter upper bound on this chromatic number via a new functional on a graph G, We call our 
schemes semi-universal because the scheme depends on the source distribution only through the position 
of the zeroes in the channel matrix. By comparing our new exponent directly with the previous results 
one sees that our scheme is capable of sending a larger number of sequences without error, i.e. we can 
expurgate more types which leads to better exponents. 

Although our scheme applies to the vanishing error probability case, it is derived from the study of a 
related zero-error problem. The zero-error formulation of source coding with full side information was 
studied by Witsenhausen @, who showed that for fixed blocklength, n, the fewest number of messages 
required so that the decoder can reproduce the source with no error, i.e. P£ y (X n = X n ) = 1, is 7(C?x)> 
the chromatic number of the n-fold strong product of the characteristic graph of the source. 

The required rate, sometimes referred to as Witsenhausen's rate in the literature, is therefore 

R(G) = lim -log 7 (G n ). (1) 

n— >oo n 

(We note that limit in ([T]) exists by sub-additivity and appealing to Fekete's lemma.) Unfortunately, the 
problem of determining R(G) 'seems, in general, far beyond the reach of existing techniques' ifTOl : 
see also the comment at the end of section IV here. However, since 7(G n ) < , y(G) n , it is clear that 
R(G) < log 7(G). We provide a new bound on R(G) by bounding the chromatic number of G n restricted to 
typeclasses. Our techniques combine graph- and information-theoretic techniques, see Korner and Orlitsky 
[|TT| | for a comprehensive overview of the applications of graph theory in zero-error information theory. 
The rest of the paper is organized as follows. Section [II] gives definitions. Section III gives some 



useful properties of k. In Section IV we motivate k and give our first result, a single letter bound on 
Witsenhausen's rate. In Section [V] we give our second result, improved error exponents for the problem of 
lossless source coding with full side-information; examples and comparisons to previous known exponents 
are also given. In Section VI we use the ideas from Section [V] to give our third and fourth results, an 
improved error exponent for the lossy problem and determination of the reliability function for the case 
when the side information is a deterministic function of the source. In Section VII we briefly give an 
application of k to channel coding. 



II. Definitions 

Script letters, e.g. X, y, denote alphabets. The set of all probability distributions over an alphabet X 
will be denoted by T(X). Small bold-faced letters, e.g. x £ X n ,y £ y n denote vectors, usually the 
alphabet and length are clear from the context. For information-theoretic quantities, we use the notations 
of lfl2l . _£f(x|y) denotes conditional empirical entropy, i.e. the conditional entropy computed using the 
empirical distribution P x ,y We use [x] + to denote max(0,x). Unless specified, exponents and logarithms 
are taken in base 2. 

A graph G = (V, E) is a pair of sets, where V is the set of vertices and E C V x V is the set of 
edges. Two vertices x,y £ V are connected iff (x,y) £ E. We will restrict ourselves to simple graphs, 
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i.e. undirected graphs without self-loops. The degree of a vertex v, A(v), is the number of other vertices 
to which v is connected. The degree of a graph G, denoted A(G) is defined as max„ 6 i/ A(v). A coloring 
of a graph is an assignment of colors to vertices so that no pair of adjacent vertices share the same color. 
The chromatic number of G, 7(G), is defined to be the fewest number of colors needed to color G. For 
U C V, G(U) is the (vertex-) induced subgraph, i.e. the graph with vertex set U and edge set Ef](U xU). 
For two matrices, V, W we use V <C W to mean that W(b\a) = implies V(b\a) = 0. 

Let G — (V, E), H = (V, E') be two graphs. The strong product (or and product) G A H is a graph 
whose vertex set is V x V and in which two vertices (v,v'), (u,u') are connected iff 

1) v — u and (v' : u') G E' or 

2) v' = vl and (v, u) G E or 

3) (u,u) G ,5 and (v',u') G 

We will be interested in G n = G AG A . . . AG (n-factors), the n-fold strong product of G. One may think 
of the vertices of G n as length n vectors (v 1, . . . , v n ) with two vertices are connected in G n if all of the 
components of the vectors are the same or connected in G. The characteristic graph, G x , of a source 
P X y is the graph whose vertex set is X and two vertices x, x' are connected if there is a y G y such that 
P(y\x')P(y\x) > 0. For a given y, the set Z(y) = {x : P(x|y) > 0} is the set of 'confusable' sequences, 
i.e. the set of xs than can occur with a given y. For a graph G and distribution Q on the vertices of G, 
we define the following functional. 
Definition 1: 

k(G,Q) = max H(V\Q). (2) 

V:V<CG 
QV=Q 

Note when we write the graph G where a matrix is expected, we abuse notation and refer to the matrix 
G = A + 1 where A is the adjacency matrix of graph G and I is the identity matrix. 
Equivalently one may think of k as follows 

k{G,Q)= max H{X\X). 

X,X: 

Qx=Q x=Q 

where X and X have common alphabet and P(x\x) > iff x, x G E(G). 



III. Properties of « 

In this section we give some properties of k which will be used elsewhere in the paper. Throughout this 
section G is a graph, Q is a distribution on the vertices of G and X is a random variable with distribution 
Q. 

Property 1: k(G,Q) < H(Q) = H(X), where equality holds if G is fully connected. 

Proof: Note that any valid choice of channel in the optimization defining k(G, Q) satisfies QV = Q, 
thus H(V\Q) < H(Q), giving the first claim. 

If G is fully connected then the constraint V < G imposes no restriction on the choice of V . The 
problem is then to choose a V that produces the given output distribution Q. Setting the rows of V equal 
to Q gives k(G, Q) = H(Q). ■ 

Property 2: If G is the disjoint union of fully connected subgraphs then 

k(G,Q) = H(X\Y). (3) 

where 

1) Y is a random variable with alphabet size |^| equal to the number of disjoint subgraphs in G so 
that to each subgraph we associate a unique element y G y; and 

2) for the subgraph associated with y, the event {X = a,Y = y} has probability Q(a) if a is in the 
subgraph and probability zero otherwise. 
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Proof: Without loss of generality we may assume the adjacency matrix of G plus the identity 
matrix is block diagonal, where each block corresponds to a fully connected subgraph (i.e. is all Is). 
By independence it suffices to solve the maximization problem for one of these blocks, say the one 
associated with element y. 

Suppose that the subgraph has vertices 0,1,0%, ... ,a n and define the (semi) probability measure Q y = 
[Q(ai) Q(a 2 ) ■ ■ ■ Q{a n )\. Then the problem is 

max £g^)£-V(&|a)logK(&|a). (4) 

V -**2y v — Wy 

a b 

Let Q y = jj^ii • The maximizing V is unchanged if we replace the problem by 



max. 

V:QyV = Qy \\Q 

= max H{V\Q y ). 

V:Q y V=Qy 

We now use the proof of property 1 to allow us to conclude that setting the rows of V to be Q y solves this 
maximization. Using the definition of Y to see that \\Q y \\ = F(Y = y) and substituting the maximizing 
V, equation becomes 

J]Q» J2 -Qv( b ) ^gQ y (b) = P(y = y)H{Q y ) 

a b 

= P(Y = y)H(X\Y = y) 

Summing over the subgraphs gives the result. ■ 
Property 3: Let G be a graph and be a sequence of distributions (on the vertices of G) converging 
to distribution Q°°. Then 

limsup/«(G',Q (n) ) < k{G,Q°°) 

n— >oo 

(I.e. k(G, ■) is upper semicontinuous in Q for a fixed G.) 
Proof: Let 

V {n) = arg max H{V\Q (n) ), 

V:V«.G 
Q(n)y=Q(i) 

where exists because we are maximizing a continuous function over a compact set. By choosing 
a subsequence and relabeling we may arrange it so that HiV^^Q^) — > lirasnp H(V^\Q^) and 

y{n) _^ yoo^ where both yoo < Q md Qooyoo = goo ^ tfue j n which case 

limsupfi;(G',g (n) ) = limsupi/(V r(n) |Q ( ™ ) ) 

= HiV^lQ 00 ) < k(G,Q°°). 



IV. Bounding Witsenhausen's Rate 

We recall that in Witsenhausen's problem [0 the goal is communication of X n to the decoder who 
has access to Y n under the criterion P^ Y {X n = X n ) = 1. This requirement is stricter than the vanishing 
error probability criterion of Slepian-Wolf and increases the rate from H(X\Y) to R(Gx)- Witsenhausen's 
scheme is as follows: the decoder sees Y n , a realization of the side-information and can identify the set 
Z(Y n ) and this set forms a subgraph in G\. If the vertices of G\ are colored then the encoder can send 
this color to the decoder, which can then uniquely identify the source symbol in Z(Y n ). And a result of 
[HI proves that when encoding blocks of length n, 7(G^) the smallest size of the signaling set possible. 
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When considering very large blocklengths, the fact that there are only polynomially many types means 
we can send the type essentially for free. A possible modification of Witsenhausen's scheme is as follows. 
First, fix the blocklength n and for every type Q x , the encoder and decoder agree on a coloring of the 
graph Gx(Tq x ) using 7(G x (Tq x )) colors. The encoder and decoder operate as follows. 

Encoder: The encoder first communicates Q x , the type of the source sequence. Next the encoder looks 
at the graph G x (Tq^), that is the subgraph of G\ induced by TR and sends the color of vertex x to the 
decoder. 

Decoder: The decoder sees side-information y and identifies the set Z(y). Knowing the type the decoder 
can examine the induced subgraph G x (Tq^ fl Z(y)) and using the color from the encoder, identify the 
source sequence. 

The following lemma shows that this scheme is asymptotically optimal. 
Lemma 1: 

fc(G) = hm max (5) 

n->oo Q X £T n (X) 71 

Proof: The number of bits used by our scheme is an upper bound on R(G) and hence 



R(G) < liminf 



log(n + 1)M \og 7 (G x (T% x )) 
+ max 



lim inf max 



log7(G£(ISj) 



n^oo Q x &V n {X) 71 

But trivially we also have 

R(G) > limsup max 

where we used the fact that the chromatic number of the subgraph is at most the chromatic number of 

G x . ■ 

We now bound the chromatic number of the induced subgraph in two steps. First we give a degree 
bound on induced subgraph. 

Lemma 2: Let Q x G V n {X). Then 

(«+l)-WWexpK(G x ,Q x )) - 1 < A(G x (Tq x )) < (n + 1)™ exp(nK n (G*, Q x )) (6) 

where 

K n (G x ,Q x ) = ma^x H{V\Q X ). (7) 

V«G X 
QxV=Q x 

Note: K n maximizes over types rather than distributions, but of course we may replace n n by k in the 
right-hand equality of ([6]) to get another valid upper bound. 

Proof: Suppose x e Tq , and let W(x) denote the neighbors of x in the induced subgraph G x (Tq x ). 
We partition the set {(x, x') : x' G VF(x)} by joint type Q xx < and observe that each joint type can 
be written as Q x x V for some V. One may verify V G x . One also sees that Q X V = Q x , since 
(x, x') G T% xxi nE(G n (T% x )) implies Q' x = Q x and writing Q xx , = Q x xV, tells us that Q X V = Q x . 

For any x G Tq x we can count the number of strings in W(x) by decomposing {(x, x') : x' G W(x')} 
into joint types, choosing a V for each joint type and using the standard cardinality bounds for type 
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classes. Thus 

A(G X (T£J)< Yl T vW 

V:V<G 
QxV=Q x 



< 



exp(nH(V\Q x )) 



v-.v^g 

QxV=Q x 

< (n + l) 1 * 11 * 1 max exp(nH(V\Q x )) ■ 

V:V<G 



For the reverse inequality, we let A(x) denote the degree of vertex x in the induced subgraph. Then 

A(x) = T vW- 

V:QxxV£V n 
V^I,V<CG 
QxV=Q x 

To see this, note first that if V arises by selecting a x' G W(x), then Ty(x) C W(x). And second, 
that any V ^ I with V <ti G and QxV = Qx gives rise to a neighbor. Then because A(G x (Tq x )) = 
max xg r Q ^ A(x), we have 

A(GJCZ3J) = max £ T£(x) 

V^/,V<G 

A(G X (T£J)= max £ T£(x) - 1. 

Qx V:Q x xV£V n 
V<G 

Using the cardinality bound for typeclasses we get 

A(G%(TZ )) > max max T£(x) - 1 

> max (n + 1)H*H*I max exp(n(#(V|Q x ))) - 1 

<3xV=Qx 

= (n + l)~ lxm max exp(n(H(V\Qx))) - 1 

where we implicitly assumed we still have Qx x V E V n . ■ 
Using the previous lemma we bound R(G) as follows 
Theorem 1: 

R(G X )< max k(G x ,Qx)- 

Qx&V{X) 

Proof: A well-known fact from graph theory tells us that 7(G) < A(G) + 1 lfT3l sec 5.2]. This 
combined with the previous lemma gives 

\o gl {G x {T% x )) 
max 

Q x eP n (x) n 

< max n~ l log Un + l) 1 * 11 * 1 exp(nK n (G x , Qx)) + l] 

< max ri- 1 log[(n + l) | - Y|l ^ l exp(n«;(G x ,g x )) + ll 
Qxev(x) 

where the final line used the fact that in both maximizations we maximize over a larger set. Taking limits 
as n — > 00 gives the result. ■ 
We now discuss the tightness of the bound. 
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A. Tightness of the bound in Theorem [7] 

We note that the bound given by k on R(G) need not be tight. To this see, consider the graph G with 

V(G) = {0,1,..., 2 n } and E(G) = {(n, n + 1) : n > 0} U {(0, n) : n > 2}. It is clear that 7(G) = 3 for 
all n, and hence i?(C7) < log 3. Yet, if we choose 



V(b\0) 



V(b\a ^ 0) 



Q 



if b = 
2™ n otherwise 

1 if b = 

otherwise 



1 



1 



1 



2 ' 2 n+l ' 2 n+1 ' ' ' 2 n+1 



one sees that 7 < G and therefore that 

k(G,Q) > H(V\Q) 



1 



log 2" 



n 

2""°" 2' 

Although the gap between R(G) and the bound of Theorem [T] may be arbitrarily large, note that that the 
bound of Theorem [I] is a convex program, where as the computation of even 7(G) is NP-complete. Hence 
although we do not know whether our bound is ever better than the bound provided by 7(G), from a 
computational point of view our bound has an advantage. 

V. Improved Exponents for Lossless Source Coding 

We consider the same setup as in Figure [T] The encoder/decoder pair are functions ip : X n — > At and 
V? : M. x y n — > X n , where M. is a fixed set. We define the error probability to be 



(8) 



where X n = (p(ip(X n ),Y n ). In this section we are interested in the asymptotic behaviour of the error 
probability P e (ip,(p, A) as n gets large. We define the error exponent (or reliablity function) to be 



XY 



lim lim inf log 

elO n-s>oo n 



mm iP e (if>,(p) 



where the minimization ranges over all encoder/decoder pairs satisfying 

1 



71 



\og\M\ <R + e. 



(9) 



(10) 



Our main result is 

Theorem 2: For any R > and P XY G V{X x y), 



9(R,P 



XY, 



> inf 

QXY- 

mm(K(G x ,Q x ),log-y(G x ))>R 



D(Q 



XY I -r XY) 



+ (R-H Q (X\Y)Y 



(11) 



where Gx is the characteristic graph of the source Pxy- 

To achieve this exponent we use the following scheme. First, fix the blocklength n. For every type Qx, 
the encoder and decoder agree on a coloring of the graph G x (Tq x ) using j(G x {Tq )) colors. When 
log7(C7^(Tg x )) > nR, the encoder and decoder agree on a random binning of the typeclass T n (Qx) 
into exp(ni?) bins. The encoder's message set is 

M = Mi x .M2 where 

Mi = {l,2,...,ex V (nR)}, M 2 = {1, 2, . . . , (n + 1) 1*1} 
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Encoder: The encoder sends the type Q x of the string. If log7(G^(TQ x )) < nR, then there is sufficient 
rate to send the color to the decoder. If not, the encoder sends the bin index of the string x. In both cases 
we let U (x) denote the index sent to the decoder. 

Decoder: The decoder receives the index of the type and side information y. If log ^(G x (Tq x )) < nR 
the color index and the side information allow the decoder to reproduce X n without error. In the opposite 
case, the decoder receives a bin index, looks in that bin and chooses an x in the bin so that if (x|y) < 
if(x|y) for all other x in the bin. 



A. Analysis 

To prove our theorem, we will use the following definition and lemmas. Let 

£ = {(x,y):l g 7 (G^(T«J)>ni?}. 

Observe on S c our scheme makes no error. 
Lemma 3: For all strings x, y, let 

S(x|y) = {x|if(x|y) < if(x|y), Q x = Qx}. 



Then 

Proof: 



|S(x|y)| < (n + l) |A1|3;| exp(nif(x|y)). 
|5(x|y)|<|{x|ff(x|y)<ff(x|y)}| 

= E E * 

V:V£C"(Q y ,X)ieT v (y):H(k\y)<H( X \y) 

E i T ^)i 

V:VeC n (Q y ,X) 
H(V\Q y )<H(x\y) 



< exp(nif(x|y)) 

V:V<=C n (Q y ,X) 
H{V\Q y )<H(x\y) 

< (n+ exp(nH(x\y)) 



Lemma 4: For all strings x, y 

P(X n ^ X n \X n = x, Y n = y) < exp(-n(R - if(x|y) - 5 n ) +). 

where 8 n — > with n. Moreover if (x, y) e S c then 

P(X n ^ X n \X n = x, Y n = y) = 0. 

Proof: As noted in the specification of the decoder, for types Qx so that \og ^(G x (Tq x )) < nR the 
decoder makes no error. For the opposite case we bound the set of candidate x with 5'(x|y) yielding 

P(X n j£ X n \X n = x, Y n = y) 

*eS(x|y) 

< {n + l) lxm exp(-n(R - if(x|y))) 

< exp(-n(i? - if(x|y) - S n )) 

Using the fact that P(X n ^ X n \X n = x, Y n = y) < 1 gives the result. ■ 
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Lemma 5: Let G be a graph, 5 n > 0, 5 n > 0, 5 n sequences converging to zero, 

^n(Qxy) = 

'D(Q XY \\P XY ) if K (G,Q x )>R-~5 n 

{R-H Q (X\Y)-5 n )+-l n 
oo otherwise, 

F(Qxy) = 

'd(Q xy \\P xy ) + (R-H q (X\Y)) + if k(G,Q x ) >R 
oo otherwise 

and Q XY be a sequence of distributions converging to Q XY . Then 

liminfF n (Q&) >F(Q£ y ) (12) 

Prao/:- We proceed by cases. Case 1: is such that w(G, Q x ) > R. If ft(G, Q { x ) < R-5 n for all 
sufficiently large n, then the left-hand side is infinity and the result trivially holds. Otherwise we appeal 
to the semicontinuity of the information measures. 

Case 2: Q XY is such that k(G, Q x ) < R. In this case we see, by appealing to k property |3j that 
limsup k(G, Qx) < R, whence ( [12] ) holds with equality eventually. ■ 

Proof of Theorem |2j- For any e > 0, we note that for sufficiently large n the constraint ( fTO] ) is met. 

Let 

T = {Qxy e V n (X x y) : \o gl (G x (T^)) > nR}. 

We begin by partitioning the sequence space by joint type and computing the error probability for each 
type 

Pe=J2 E P(X n ^X n ,X n = x,Y n = y) 

Qxy (x,y)GT5 xy 

< E E exp(-n(i?-#(x|y)-<y + ) 

Qxy&T" (x,y)eT» xy 

x exp(-n(D(Q XY \\P XY ) + H(Q XY )) 

< ^M-n((R-H Q (X\Y)-5 n ) + 

Qxy&T" 

+ D(Q xy \\Pxy))) 
<(n + max exp(-n((i? - H Q (X\Y) - 5 n ) + 

Qxy&T" 

+ D(Q XY \\P XY ))) 

where in * we applied a standard identity for the probability of a sequence in Tq xy and Lemma |4| For 

any G, A(G) + 1 > 7 (G), thus 

T n C {Q X y G P n (* x y) : log(A(G^(T^)) + 1) > nR}. 

Let 

g n (G x ,Qxy) 

= log(exp(n[/<G x , Q x ) + n" 1 ] *| 2 log(n + 1)]) + 1) 
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Fig. 2. Two example source distributions and their characteristic graphs 



and observe that n 1 g n (G X) Qxy) k{G x , Qx) and let 5 n = n 1 g n (G x , Qxy) — k(G x , Qx)- Appealing 
to Lemma [5] with k in place of K n , we may further bound the set by {Qxy £ V n (X x y) : g n (G x , Qxy) > 
R} i.e. 

T n C f n = {Qxy E V n {X x y) : k(G x , Q x ) + 6> R}. 



Adopting the definitions from Lemma dJ with 5 n = n \X\\y\ log(n + 1) we see 



n- l \ogP e > min F n (Q XY ). (13) 



For each n, let Q X y achieve the minimum in ( [T3~] ). Taking a convergent subsequence and relabelling we 
may assume that Q X y — > Q X y- Hence 



liminf F n (QP Y ) > F(Q^ Y ) 

n— >-oo 

> inf F(Q XY ) 

QxyeV(Xxy) 

where * follows from Lemma [5] The inequality 

\o gl (G x ) > n- 1 log( T (^)) > n-HogiG^T^)) 
implies that we may repeat the argument above to yield the achievable exponent 

'D(Q XY \\P XY ) + (R- H q (X\Y))+ if h gl (G x ) > R 
oo otherwise 

Taking the maximum of both exponents gives the result. ■ 

B. Examples 

In this section we compute the exponent of Theorem [2] and compare it with the best previously known 
exponents. First we demonstrate a case in which the exponent of Theorem [2] achieves the sphere packing 
exponent. 

When the side information is a deterministic function of the source, i.e. Y = f(X), k property [2] allows 
us to compute k explicitly and the optimization forces the inner most optimization to yield Q Y \ X = P Y \ X , 
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i.e. the 'deterministic' side information. If we associate a y to each fully connected subgraph in Gx, then 
we see that 



log i{G x ) 



max log |/ 1 (y)\ 



> max H(X\Y = 

> H(X\Y). 

From these observations it follows that the exponent reduces to 



y) 



zsp(R, Pxy) 



inf D(Q XY \ 

Qxy-Hq(X\Y)>R 



XY) 



the sphere packing exponent for this problem. Thus our scheme is optimal for all rates and the reliability 
function is determined for this problem. 

For comparison with previous results we turn to Example (see Fig. [2]). In Figure [3] we plot our 
exponent against e* CK = max(ecK, ecK,r) and eon, where e C K and e C K,r are the expurgated and random 
coding exponents of Csiszar and Korner 0, and eon is the exponent of Oohama and Han J6]|. 



ecK = ™fD(Q x \\P : 



x, 



+ 



inf E[d P (X, X)]+R- H{X\X) 

} XX .H(X\X)>R 

Qx=Qx 



where 



and 



dp(x, x) 



-log yJP{y\x)P{y\x^j 



e OH = inf D(Q XY \ 

Qxy-H(Q x )>R 



Pxy) + {R-H q {X\Y))- 



From Figure [3] we see that our exponent lies below the sphere packing exponent and above the random 
coding exponent of Oohama and Han. When compared with e* CK , we see that our exponent agrees 
(numerically) and has the benefit of semi universality. 

For Example B (Fig. [2]), it is clear that any rates in excess of one bit allows the decoder to determine 
the source sequence without error. The various error exponents are plotted in Fig |4j For this example our 
exponent is infinite for all rates above 1 bit since log(7(Gx)) = 1. However e* CK is finite for some rates 
above one bit, and therefore we beat e* CK . Below 1 bit, e H, €*ck an d our exponent appear to agree. The 
random coding exponent remains finite for all rates below log(3) bits. 

Note 1: Formally, the strongest results of are obtained by using ML decoding in their equation (41) 
but the complexity of the optimization make computation infeasible, even for these simple examples and 
exploiting convexity. However, in the particular case of our Example B, we note that if for some R the 
exponent cck is finite, then there exists a Qx for which 

inf E[d P (X,X)] +R-H(X\X) < 00. 

Q Xx :H(X\X)>R 

Q x = Qx 

Then according to (21 Lemma 4], the random variables in their set V(Q Y \x, Qy\x, Q, R), which give 
equality in their equation (28) would give rise to the exponent in their equation (16) being finite. As cck 



'Please note the plot and discussion concerning Example A reported in a preliminary version of this work 1141 were incorrect. 
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Fig. 3. Comparing exponents for Example A of Figure|5] Our exponent coincides with e* CK and both lie below the sphere packing exponent. 
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Fig. 4. Comparing exponents for Example B of Figure [5] Our exponent is infinite for all rates above 1 bit. e* CK is finite for some rates 
above 1 bit. 



is finite for some rates above 1 bit, their exponent (41) would be finite, thus at least for Example B, our 
exponent is strictly better than the previously known best exponent. 

Note 2: In general one also sees (via Property 1) that our exponent is never worse than the Oohama and 
Han exponent, because by k property [T] nature is forced to optimize over a smaller set of distributions. 
Put another way, compared to the Oohama and Han exponent, we are able to 'expurgate' more types. 

VI. Improved Exponents for Wyner-Ziv 

When dealing with lossy reproduction it is often convenient to use 'covering' (i.e. quantization) followed 
by binning and in this section we describe how use of the characteristic graph can yield improved error 
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exponents in such scenarios. We focus on lossy compression with side information i.e. Wyner-Ziv 0. 
Formally the error exponent problem in this case is as follows. 

Let X be the reproduction alphabet and d : X — > X a single letter distortion measure. Define the 
distortion between two strings as d(x, x) = - Y17=i The encoder/decoder pair are functions 

f n : X n ->• M and g n : M x y n -> <£ n , where .M is a fixed set. 

Let X n = g n (f n (X n ), Y n ) be the decoder's output and define the error probability 

P e (r, # n , A, d) = P (d(X n , X") > A) . (14) 
We define the Wyner-Ziv error exponent to be 



tt(R, A, Pxy, d) = limliminf log 

elO n-s-oo n 



min P e (f n ,g n , A, d) 

(f n ,g n ) 



(15) 



where the minimization ranges over all encoder/decoder pairs satisfying 

\og\M\ < n(R + e). (16) 

Before we state the result we define another graph functional. 
Definition 2: 

k 2 (Pxy, Qxyu) = [k{Gu, Qu) - H{Qu\ x \Q x )} + , 
where the graph Gu is defined from the distribution 

Quy(u,y) = ^2 l P XY {x,y)Q u \x{u\x). 

Note: Since Pxy will be fixed throughout, we will abbreviate to K2(Qxyu) ° r even simply k 2 (Qx)- 
Our first result in this section is Theorem [3] 

Theorem 3: Let P XY G V{X x y) and R > 0, A > 0, d(-, ■) be given. Then 

ti(R, A, Pxy, d) > inf sup inf sup inf r](R, Pxy, Qxyu, <fi) 

® x Qu\x ® Y (pG^QxYU 

where 

' D{Qxyu\\PxyQu\x) if E Q [dpf,0(F,[/))] > A 

^(gxyc/HPxyQaix) if E Q [d(X, C/))] < A 

- J Q (X; [/) + I Q (Y; t0] + and «: 2 (P xy , g xyc/ ) > P 

oo otherwise 



rj(R, P X y, Qxyu, 



and J 7 = {(f)\(f) : y x U — > X}. Note in the final minimization over Qxyu, Qxu an d Qy are fixed to be 
those specified earlier in the optimization. 
Discussion of Result 

In [H, the present authors determined an achievable exponent for the Wyner-Ziv problem, obtained by 
replacing i] in Theorem [3] with 

{D{Qxyu\\PxyQu\x) if E Q [d(X,<f)(Y,U))] > A 

D(Qxyu\\PxyQu\x) ifE Q {d(X,<f)(Y,U))] < A 

+[R - Iq(X; U) + I Q {Y; U)}+ and /(X; U) > R 
oo otherwise, 



Vd (R,Pxy,Q 



XYU, 
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the difference being the conditions under which we switch from case 2 to case 3. Theorem [3] is obtained by 
modifying the scheme in [[81 taking into account the graph-based expurgation established in the previous 
section. Recalling k property [T] we have the following inequality 

k 2 (Qxyu) = [k(Gu,Qu)-H(U\X)] + 
<[H{U)-H{U\X)] + 
= I(X; U) 

therefore for any R, P XY , 4> an d Qxyu we see that r] D (R, P XYl Qxyu, 4>) < Pxy, Qxyu-, 0) and the 
present modification yields an achievable exponent that is never any worse than the result of (8]|. 



A. Sketch of Scheme 

Operating at blocks of length n, for each type Q x , a test channel Q*jj\ x {Q x ) = Qu*\x is selected. The 
test channel is used to generate a codebook, B n (Q x ), of approximately 2 n/ ( f/ * ;X ) codewords. The key 
insight is that the (random) graph B n (Q x ) H G^,, constructed from 

Qu* Y (u,y) = ^2P XY {x,y)Qu*\x{u\x) 

plays the same role in this problem as did the graph characteristic graph of the source P XY in the 
Slepian-Wolf problem. 

In this modified scheme, the encoder first communicates the type of X n and then if there is sufficient 
rate, i.e. nR > log "f(B n (Q x ) D Gy*), rather than communicating a bin index the encoder may send the 
color of the codeword in the graph Gu*. If there is insufficient rate, then the encoder communicates a 
bin index of the codeword. For each pair marginal types (Q x , Qy) the decoder can choose an estimation 
function cf> and depending on the case, either decodes using the graph, or a minimum empirical entropy 
decoder. The estimation function is then used to combine the side information and the codeword to yield 
the reproduction. 



B. Deterministic Side Information 

We now use the result of Theorem [3] to determine the reliability function when the side information is 
a deterministic function of the source, i.e. Y = f(X) a.s. for a deterministic /. We first note that in this 
case, the solution to the inner-most optimization must be Qy\xu = Py\x e l se me exponent is infinite. 
This reduces the problem to 

inf sup rj(R,P XY ,Q X YuA) 

Qx Qu\x,4> 

where the distribution of Qxyu is QxPy\xQx\u, i-e. U, X and Y form a Markov chain in that order. We 
can massage the exponent infg x supg ^ r](R, P X y, Qxyu, 4>) as follows 



inf sup 

Qx Quix,' 



> inf 

Qx , 



D(Qxyu\ \PxyQu\x) 
D(Qxyu\\PxyQu\x) + 

[R-I Q (X;U)+I Q (Y;U)} 



oo 



sup 

} mx :Y=v(U),4> 



if E Q [d(X,0(y,C/))] > A 
if Eq[c?(X, 0(Y, U))\ < A 
and k 2 (Qxyu) > R 
otherwise 

f D(Q XYU \\P XY Q ulx ) if E Q [d(X,<p(Y,U))} > A 

D(Qxyu\\PxyQu\x)+ if EQ[d(X,(f>(Y,U))] < A 

[R - I Q (X; U) + I Q {Y- U)]+ and [H(U\Y) - H{U\X)\ 
oo otherwise 



> R 
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where the previous inequality follows because we maximize over a smaller set. The notation Qu\x '■ Y = 
v(U) means we consider only those test channels that result in Y being a deterministic function v of U. 
By construction U,X and Y still form a Markov chain in that order, thus H{U\X) = H{U\XY) and we 
can continue the chain of equalities with 



inf sup 

Qx Qu\x--Y="(U),<f> 



' D(Qxyu\\PxyQu\x) ifE Q [d(X,(P(y,U))] > A 

D(Qxyu\\PxyQu\x)+ ifE Q [d(X,(/>(Y,U))] < A 

[R-I Q {X-U\Y)Y mdI{X;U\Y)>R 

oo otherwise. 



Note now that the only difference between Qxyu and PxyQu\x occurs in Qx, so it follows that the 
quantity above can be written as 



inf sup 

Qx Qmx-Y=v{U),<t> 



inf sup 





if E Q [d(X, (f)(Y, U))] > A or I(X; U\Y) > R 
otherwise. 

if E Q [d(X, <f)(Y, U))] > A or I(X; U\Y) > R 
otherwise 



To argue the final equality, let Qx and R be fixed. The direction < is clear since we maximize over a 
larger set. For >, it suffices to show that if the optimization on the left side yields D(Qx\\Px) then so 
does the optimization on the right. On account of the fact that the objective is piecewise constant (over 
Qu\x and (j>), when the left side is finite, there exists a Q*u\ x : Y = u(U) and (p causing evaluation 
to D(Qx\\Px)- Suppose by way of contradiction there exists a non-deterministic Qu\x which yields an 
infinite exponent. This means that 

I(X; U\Y) < R and E Q [d(X, £/))] < A 

but then by Lemma ji] (which follows) we can find a deterministic Qfj\ x and corresponding with the 
property that 

/(X; U\Y) < R and E Q [d(X, 4>(Y, U))] < A 

implying that Qfj\ x would yield an infinite exponent, contradicting the optimality of Q* V \ X - 

Lemma 6: Let Qx be given and let Y = f(X) with Py\x denoting the induced conditional distribution. 
Then for any Qu\x, 4>i there exists a Qjj\ x and <p so that when Qxyu = QxQu\xPy\x, 

1) E QxYU [d{X,mu))\ =E QxY(j [d{X,4>{Y,U))\, 
2) I(X;U\Y) = I(X;U\Y) 

and 3) Y = v{U) for some deterministic function v. 

Proof: Define U = (U,Y) and <p{Y,U) = (p(Y,U). Then clearly conditions 1 and 3 hold. To see 
condition 2 note by the chain rule 

I(X; U\Y) = I(X; U, Y\Y) = I(X; U\Y) + /(X; Y\Y, U) = I(X; U\Y). 

Finally we point out that since Y = f(X) we also have U •<-> X •<-> Y. ■ 
Rewriting this final optimization problem as 



inf sup \ 
Qx Q u{X ,</> I oo otherwise 

= inf 

Qx-R wz {A,Qx)>R 

< tt(R, A, P xy , d) 



[d{Q x \\Px) if E Q [d(X, <t>{Y, U))} > A or I(X; U\Y) > R 



inf D{Q x \\Px) 

lx-Rwz(A,Qx)>R 
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where Rwz(^,Qx) denotes the Wyner-Ziv rate distortion function for the source with X ~ Qx and 
Y = f(X) with distortion measure d. But according to the change-of-measure argument of [8, Theorem 
4], 

7r(R,A,P XY ,d) < inf D{Q X \\P X ). 

Qx-Rwz(A,Q x )>R 

Thus our scheme is optimal in the sense that it meets the change-of-measure upper bound. 

VII. Connection to Channel Coding 

In this section we demonstrate that k has applications in zero-error channel coding problems. Let 
G = G(W) be the characteristic graph of the channel W, and c(G) denote the zero error capacity (see 
ifTTl Section III] for definitions). The independence number of a graph, denoted a(G), is the maximum 
cardinality of a set of vertices of G of which no two are adjacent. We recall that c(G) > log a(G). 
According to lfT2l pg 187 (prob. 18)] 

loeafG) = max min I(X:X). 

p p x =p <r =p 



E[d w (X,X)]<oa 



Expanding the mutual information gives 



loga(G) = max min H(X) - H(X\X) 
p p x =p ji =p 

E[d w (X,X)]<oo 

maxH(P)- max H(X\X). 
p p x =p x =p 

m[dw(x,x)]«x 



If Px V = P xx then E[d w (X, X)] < oo is equivalent to V < G. To see this note that E[d w (X, X)} < oo 
if for all x, x s.t. P(x,x) > 0, there is some y for which W (y\x)W (y\x) > i.e. (x,x) G E(G). 
Conversely, if V G, then > only when there is some y for which W \y\x)W \y\x) > 0. 

Hence, 

loga(G) = maxH(P) - k(G,P). 
Hence k provides a lower bound on the zero error capacity of a channel W. 

Appendix A 
Proof of Theorem [3] 

The key to the proof is Lemma [8} a bound on degree of the codebook graph which holds with 
exponentially high probability. With this fact established we give a scheme for coding when the bound 
holds and declare an error when the bound does not. 

A. Codebook Construction 

Operating on blocks of length n, for each type Qx choose a test channel Qu*\x = QmxiQx) an d let 
Qu* — QuiQx) denote the resulting induced marginal type^J The test channel is used to build a codebook 
B n (Q x ) as follows. For each u e Tq , flip a coin with probability of heads 



p = exp l — n 



n 



2 For brevity we will use the following conventions: The random variable U* (resp. channel Qu*\x) refers to the random variable (resp. 
channel) defined by the choice of test channel for the particular Qx under consideration. 
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and add u to the codebook only if the coin comes up heads. Define the distribution 

Quy(u,y) = S2 i P X y{x,v)Q v *\x{u\x) 



and let Gu* be the resulting characteristic graph. The codeword for x G Tq x is chosen as follows. If 
{?(x) = B n (Q x ) HTq^ x (x) is non-empty, choose uniformly from (?(x). If <?(x) is null, choose uniformly 
from B n {Q x ). We let U{x) denote the chosen codeword. For each codebook, we define 6q x : B n (Q x ) — > 
[1, . . . ,exp(nR)] (a binning function) as follows, for all u G B n (Q x ) 

P(6q x (u) — i) — exp(— nR), for all i G [1, . . . ,exp(ni?)]. 

5. Scheme 

In Lemmas [7] and [8] we establish that 

7 (Gcr. n B n (Q x )) < A(Gu* n £ n (Q*)) + 1 

w.h.p. _ 

< exp(n[K 2 (Qx) + A n + 8 n ]) + 1, 

for some A n > 0, <5 n — > as n — > oo and where stands for probability tending to 1 as n — > oo. For 

types m which the above bound fails to hold, we send an error message to the decoder. For types 
in which the bound holds, the scheme is as follows. To communicate the codeword to the decoder, the 
encoder may either give an index into the codeword set B n or using the ideas from the improved lossless 
binning scheme, it can color the graph G^* fl B n (Q x ) using a minimal coloring and send the color of 
the codeword. 
Encoder: 

The encoder first sends k(Q x ), the type of the source sequence Q x . If exp(n[K 2 (Q x ) + A n + S n ]) + 1 < 
exp(ni?), the encoder transmits the color of the codeword in the graph Gu* n£> n (Q x ). Otherwise it sends 
the bin index 6q x (C/(x)). Formally, we denote the encoder by f n : X n — >■ Ai, where 

M = [l,...,(n+ l) 1 * 1 ] x [1, ... , exp(nR)} 

Decoder: 

The decoder receives a type index, a message and the side information y. If exp(n[K2(<5x) + A]) + 1 < 
exp(nR) then the codeword can be decoded without error. In the opposite case, the decoder searches the 
bin for a unique codeword u, so that among all u in the received bin, H(u\y) < H(u\y). If there is no 
such unique codeword, the decoder chooses u uniformly at randomly from the received bin. For each pair 
of types Q x ,Qy, the decoder picks an reproduction function cf>, and declares the output as 

x where Xj = (f>(uj,yj). 

Thus the decoder g n : y n x M. — > X is specified. 
Lemma 7: Let 

\U\\X\\oz(n + \) ~ \U\\U\ , . 

S n = 3 1 " 1 6V and 5 n = 1 " 1 logfn + 1) 

n n 

K 2(Qx) = k-2(Qx) + $n and 
2 

A n = - log(ra + 1) + S n . 

n 

Then for all n sufficiently large and for all types Q x , 

P(A(Gft, n B n (Q x )) > exp(n[K$(Qx) + An]) 
<ex Pe (-(n + l) 2 ). 
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Note the randomness in A(Gj} fl B n {Q x )) comes from the fact that B n {Q x ) is a random set. 
Proof: Let K = 2 n ^ Qx ^ +Xn \ then 

P(A(G£. n B n (Q x )) > K) 
= P(3u G T Qu , : u G 5"(Qx), A(u) > 

< ^ P(u g B n (Q*))P(A(u) > |u G £? n (Qx)) 

< ]T P(A(u) > K\m G £? n (Q x )). 

Let iV(u) denote the neighbors of u in the graph Gjj, then quantity in the previous line is upper bounded 
by 



E p ( E 



ust q& veiV(u) 



From the construction of the codebook, we know that for each string v, l{ veB nj is Bernoulli with parameter 
p. Furthermore, by Lemma [5} we know that |iV(u)| < exp(n[/«(G[/, Q^) + S n ]) = J(Qx)- Therefore, by 
bounding the number of terms in the summation, letting Di be a sequence of i.i.d. Bernoulli^?) random 



variables, we have 



F(A(G n u ,nB n (Q x ))>K) 
J(Qx) 

<\T Qu ^( E Di>K). 



i=l 

Focusing on the probability, using the exponential form of Markov's inequality, one has for any 9 > 

J(Qx) 



p( E D * > K )< 

i=l 

< 



< 



exp e (J(Q x )m(l+p(e*-l))) 
exp e {9K) 

ex Pe (J(Q x )p(e e - 1)) 

exp e (0AT) 
ex Ve {J{Q x )pe 9 ) 



< exp (2 n ' K2< ^ x * >+<5n+ ^' I ' +eioge — Q'2 n ' [K2 ^ x ^ + ' 5n+Xn ^) (17) 

Choosing 9 = 1, we have 

P ( E D < - K ) - exp e (2 n[K2(Qx)+5 " + ^ ] (2 lose - (n + l) 2 )). 
i=l 

For n > 1, (e — (n + l) 2 ) < —1, hence 

P(A(G£. n B n (Q x )) > K) < \T Qut \exp e {-2 n[K2iQx)+5 " + ' s " ] ) 

< |T Q J ex Pe (-2^) 

< |T Q[/ ,|exp e (-(n + l) 3 ), 



for all n sufficiently large. Since \Tq | is only exponential in n, the result holds. ■ 
On account of the previous lemma, we have a bound, which holds with high probability, on the degree 
of Gu* H B n (Q x ). For each Q X yu, we define the event F(Q XY u) as follows 

F(Qxyv) = {A(£T(Q X ) nQ v .) > e «K(Qx)+A„] } _ 



19 



Lemma 8: For all n sufficiently large and any type Qxyu 

nF(Q XYU ))<exp(-(n + l) 2 ). 

Proof: The result follows directly from Lemma |7J 
In the remainder of this appendix and A n will be defined as in the statement of Lemma [7J 



C. Error Analysis 
Let 

£1 = {(x,y,u) : u ^T Q . /|x (x)} 

£2 = {(x,y,u) : u G T Q . /|x (x),d(x,0Q xi Q y (u,y)) < A 

exp(n[/€2(Q x ) + A n ]) + 1 > exp(nR)} 
£3 = {(x,y,u) : u G T Q . /|x (x),d(x,0Q xiQy (u,y)) < A 

exp(n[/t2(Q x ) + A n ]) + 1 < exp(nR)} 
£4 = {(x,y,u) : u G T Q ^ |x (x),d(x,0Q xi Q y (u,y)) > A} 

and 

£>i = {Qxyc/ : Qt/|x 7^ QirprCQ.*))} 

^2 = {Qxyu ■ exp(n[^(Q x ) + A n ]) + 1 > exp(ni?) 

Qv\x = Q ulx (Qx)^ Q [d(Xi<pQ x ,Qr(U,Y)) < A} 
^3 = {Qxyc/ : exp(n[4(g x ) + A n ]) + 1 < exp(niT) 

Qu\x = Q ulx (Qx)^ Q [d(X,<j> Qx;QY (U,Y)) < A} 
V, = {Q XYU : Qu\ x = Ql ]x {Q x )^ Q [d{X,<j)Q x ,Qy{U,Y)) > A}. 

The sets defined above and the following Lemmas allow us to bound the error probability for our 
improved scheme. 

Lemma 9: Let X n , Y n , U n be generated according to our scheme, then for all n sufficiently large and 
all (x, y, u) G Ei 

¥{X n = x, Y n = y, U n = u, F c (Q xyu )) < exp(-(n + l) 2 ). 

Proof: 

P(X" = x, Y n = y, U n = u, F c (Q xyu )) 
= P(X n = x, Y n = y, U n = u) 

x P(F c (Q xyu ) \X n = x, F n = y , [/" = u) 
< P(X n = x, y n = y, U n = u) 

Let A denote the event that there does not exist a u G B n (Q x ) such that u G Tq ut (x). For (x, y, u) G £1, 
the event {X n = x, y n = y, U n = u} implies that the event A has occurred. Hence 

¥(X n = x, Y n = y, t/ n = u) 
= P(X n = x, Y n = y, U n = u, A) 

< F(X n = x)F{A\X n = x) 

< F(A\X n = x). 
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Recalling p was the probability that each codeword is added to the codebook. We have 

F(A\X n = x) = P(Vu G T Qur]x : u G* B n (Q x )) 

< exp(-p|r gtr , |x(x )|). 

For x G Tq x we have the lower bound, 

|T^ |x (x)| > (n + iyWMexpinHiQu^xlQx)) 
substituting this and the value of p we get 

F(A\X n = x) < exp ( - exp (n 



3^1og(n + l)-^log(n+l) 



n 



n 



< exp(— On + 1) 



Lemma 10: Let x, y, u G £f, then 

P(X m = x, F" = y, U n = u, F c (Q xyu )) 
< P XY (x,y)eM-n[H{Q* ulx (Qx)\Q*) -*«]), 

where 



log(n + 1). 



Proof: Proceeding as in proof of Lemma [9} we have 

P(X n = x, r = y, ET = u, F c (g xyu )) 
< P(X n = x, Y n = y, U n = u) 
= P(X n = x, F n = u)P(f/ n = u|X n = x, Y n = y). 

Conditional on {X n = x}, the event {U n = u} is equivalent to {u G B n (Q^)} D {u was chosen among 
all u G B n (Qx) with u G Tq^ |x (x)}. Bounding the latter probability by 1, we have 

F{X n = x, Y n = y, [/" = u, F c (Q xyu )) 

< P" y (x, y) eM-n[H(Q* u]x \Q x ) - 3^ log(n + 1)]) 



n 



Lemma 11: For any Qxyu £ ^i an d any Pxr 

]T p(x n = x,r = y , fr = u, F c (g xw )) 

(x,y,u)£T QxYU 

< exp(-n[D(QxYu\\PxYQ*u\x(Qx)) - S n ]), 

where 5 n is the same as in the statement of Lemma [TO] 

Proof: Using the bound of Lemma 10 and the following identity for (x, y) G Tq } 



P XY (x,y) = exv(-n[D(Q XY \\P XY )+H(Q XY )}), 
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we have 



Applying the identity 



= x, Y n = y, U n = u, F C (Q XYU )) 

(x,y,u)£T QxYU 

< Yl ex P (-n[D(Q XY \\P XY ) + H(Q XY ) 
(x,y,u)er Qxy[/ 

+ H(Qu\ X \Q x ) - 6 n ]) 

< exp(-n[D(QxY\\PxY) ~ H(Q ulXY \Q XY ) 

+ H{Q ulx \Qx)-5n}). (18) 

D(Q XY \\P XY ) - H(Q U]XY \Q XY ) + H(Q U]X \Q X ) 
= D{Q xyu\\PxyQu\x) 



in ( [18] ) gives the result. 

Lemma 12: For n sufficiently large and (x, y,u) G £2 

F(d(X n , X n ) >A\X n = x, Y n = y,U n = u, F c (Q xyu )) 
< eM-njR - J Qxyu (X; U) - I Qsya (U- Y) - 6 n ]+) 
l-exp e (-(n + l) 2 ) 

where 5 n — > as n — > 00. 

Proof: Let L be the event that the decoder decodes the wrong codeword, i.e. 

L — {3u 7^ *7(X") : F(u|y) < F(C/(X")|y), u G 5 n (Q X n), 
6 Qxn (C/(X"))=6 Qxn (u)} 

and note that {d(X n , X n ) > A} fl £ 2 Q L. We can bound the conditional probability of L as follows 

¥{L\X n = x, Y n = y, U n = u, F c (Q xyu )) 
_ P(L, F c (Q xyu )|X™ = x, Y n = y, [/" = u) 



< 



P^^Qxyu)!^" = x, Y» = y, U» = u) 
P(L|X n = x, Y n = y, U n = u) 



F(A(B n (Q x )r\Qu.) < e n ^(Q^n])- 

We now bound the numerator. Recalling the definition of S'(u|y) from Lemma [3] and invoking the union 
bound gives 

F(L\X n = x,y n = y,U n = u) 
< P(uG J B"(g x ),6 Qx (u)=6 Qx ( U )), 

u6S(u|y) 

and substituting the various bounds gives 

exp(-n[R - J Qxyu (X; U) + 7 Qxyu (C/; F) - *„]+), 

where <5 n = log(n + 1). To handle the denominator, by Lemma [7] the complementary event goes to 

zero super exponentially as n — > 00. ■ 
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Lemma 13: Let 5 n , 5 n , 5 n , 5 n be positive sequences converging to as n — > oo, 



r] n (R, Pxy, Qxyu, 



D(Qxyu\\PxyQu\x) — 5 n 
D(Qxyu\\PxyQu\x) ~ $n + [R 
-I Q (X;U)+I Q (Y;U)-6 n y 



\ 



if Eq[(Z(X, C/))] > A 
if Eq[c£(X, [/))] < A _ 

5 n and kV,(Qx) + A„ > i? — 5„, 
otherwise, 



(3 n (R, A, Pxr, gO = min max min max min r] n (R, Pxy, Qxyu, 1 

<9x Qu\x Qy 4> Qxyu 



t)(R, P X y, Qxyu, 



' D{Qxyu\\PxyQu\x) if E Q [d(X,<t>(Y,U))} > A 

D(Qxyu\\PxyQu\x)+ ifE Q [d(X,(/>(Y,U))] < A 

{R - I Q (X- U) + I Q (Y; U)}+ and k 2 (Qx) > R 

oo otherwise 



and {3(R, A, Pxy, ^) = inf sup inf sup inf r)(R, Pxy, Qxyu 

Qx Q ulx Qy (j> Qxyu 



Then 



liminf/3 n (P, A,P XY ,d) > f3(R, A, P X y, d) 



(Note in (3 n the maximizations are over types/conditional types and in (3 over distributions.) 

Proof: One sees that K^iQx) + A„ = ^(Qx) + o(n) is upper semicontinuous in Qx, with this 
established the proof then follows a similar proof for the Wyner-Ziv error exponent in J8). ■ 

Proof of Theorem 2: Define 

£ = {d(X n ,X n ) > A}, 

then for our scheme we have 

Pe=J2 P ^I X " = X ' F " = y' = U ' F (^yu)) 
x,y,u 

x F(X n = x, Y n = y, [/" = u, P(Q xyu )) 
+ £ P(£|X" = x, Y n = y,U n = u, P c (Q xyu )) 

x,y,u 

x P(X" = x, Y n = y, [/" = u, P c (Q xyu )). 

By definition, when F occurs the encoder sends an error symbol, which we assume leads to the distortion 
constraint being violated. Using this observation, and rewriting the above equation, first summing over 
types then over sequences gives 

P ^ E E [n£\X n = ^,Y n = y,U n = u,F c (Qxyu)) 

Qxyu x >y^ T Q X YU 

x F(X n = x, Y n = y, U n = u, F%Q XY u)) 
+ E \T QxYU \nF{QxYu)). 

Qxyu 

On account of the fact that F(F(Q X yu)) go es to zero super exponentially for any choice of Qxyu and the 
fact that there are only exponentially many sequences and polynomially many types, the final summand 
can be safely ignored for the error exponent calculation. We use a ■< b to mean that 

lim sup — log a < lim sup — log b. 
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Let 
and 



P(x, y, u) = ¥(X n = x,r = y, U n = u, P c (Q xyu )) 



P(£|x, y, u) = P(£|X n = x, Y n = y, [/" = u, P c (Q xyu )). 
We now group the summation according to the sets outlined at the start of this section. This gives 

P ^EE[ E E ^(x,y,u)P(£|x,y,u) 

Qx Qy Qxyu&^i x,y,ueTQ xy[/ 

+ E E ^(x,y,u)P(£|x,y,u) 
+ E E ^(x,y,u)P(£|x,y,u) 

+ E E ^(x,y,u)P(£|x,y,u) 

Qxyu&t^4 x .y. u eTQ xy[/ 

where in the inner summations over Qxyu on me sets T^u the types of Q x and Q Y are fixed to be 
those set by the outer summations. On the set T>\, Lemma [9] implies the quantity P(x, y, u) decays super 
exponentially. Since there are only polynomially many types and exponentially many sequences this term 
can therefore be safely ignored. On the set V 3 , conditional on the event P c (Q xyu ), the codeword can be 
decoded without error, and hence there is no error. Using the result of Lemmas [TT] and [12] we therefore 
have 

P e d EE [ E eM-n[D(QxYu\\PxYQu\x) - 5 n 

Qx Qy Qxyu^2 

+ [R- I Q {X- U) + I Q (Y; U) - ~5 n ] + - 1]) 
+ ^ exp(-n[D(QxYu\\PxYQu\x) - S n ]) 

Qxyu^Da 

where 5 n = — - log(l — exp e (— (n + l) 2 ). Bounding the summands by their maximum value gives 

P e =< \V n (X)\ma,x\T n (y)\max\T n (X x y x U)\ 

Qx Qy 

max exp(-n[D(QxYu\\PxYQu\x) ~ 8 n 



+ [R - I Q (X; U) + I Q (Y; U) - 6 n ] + - 5 n )) 
+ max exj>(-n[D(QxYu\\PxYQu\x) - S n )) 

QxYU^i 



(19) 



Let 



~$n(Q 



X, 



n 



log(exp(nK(Qx) + A n ]) + 1) - (i%(Q x ) + A r 



and let 5 n be the maximum over Q x G V n (X) of 5 n (Qx)', it follows that S n — > 0. Adopting the definitions 



from the statement of Lemma PL3] and using a + b < 2 max(a, b) to combine the two sums of ( fT9| ) gives 

P e =< 2|P n (A')||P"(y)||P n (A' x}/xW)| 

exp(-n[rf(P, P xy , Qx yc7 , 0)]) 



x max max max 

Qx Qy Qxyu-Qu\x=Q1j\x(Qx) 
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Finally, we can optimize over Q*u\ x and 0, and move the optimizations in the exponent to give 

p e 2\v n (x)\\v n (y)\\v n (x xyxu)\ 

x exp(-n[min max min max min r] n (R, Pxy, Qxyu-, 0)])- 

Qx Q(j|x 4> Qxyu 



Taking the log, dividing by — n and then taking the lim inf n -*oo of both sides, invoking Lemma 13 on the 
righthand side gives the result. ■ 
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